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Abstract 

We compare the long-term, steady-state performance of a variant 
of the standard Dynamic Alternative Routing (DAR) technique com- 
monly used in telephone and ATM networks, to the performance of 
a path-selection algorithm based on the "balanced-allocation" princi- 
ple |}], [l^; we refer to this new algorithm as the Balanced Dynamic 
Alternative Routing (BDAR) algorithm. While DAR checks alterna- 
tive routes sequentially until available bandwidth is found, the BDAR 
algorithm compares and chooses the best among a small number of 
alternatives. 

We show that, at the expense of a minor increase in routing over- 
head, the BDAR algorithm gives a substantial improvement in network 
performance, in terms both of network congestion and of bandwidth 
requirement. 
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1 Introduction 



Fast, high bandwidth, circuit switching telecommunications systems such 
as ATM and telephone networks often employ a limited path-selection algo- 
rithm in order to fully utilize the network resources while minimizing routing 
overhead. Typically, between each pair of nodes in the network there is a 
dedicated bandwidth for communication, namely, no more than a certain 
fixed number of calls can be simultaneously active between each pair of 
nodes. This dedicated bandwidth is chosen in order to satisfy the demand 
for communication between these stations. Only when this bandwidth is 
exhausted the admission control protocol tries to find an alternative route 
through intermediate nodes. To minimize overhead and routing delays, the 
protocol checks just a small number of alternative routes; if there are no 
free connections available on any of these alternatives, then the call or com- 
munication request is rejected. Implementations that use this technique 
include the Dynamic Alternate Routing (DAR) algorithm used by British 
Telecom |7j, and AT&T's Dynamic Nonhierarchical Routing (DNHR) algo- 
rithm @. 

A common feature in these (and other) currently implemented protocols 
is the sequential examination of alternative routes. Only when the algorithm 
examines a route and finds it cannot be used an alternative one is examined. 
The criteria for when a route can or should be used, and the method in which 
the alternative route is selected have been the subject of extensive research, 
in particular, in the context of British Telecom's DAR algorithm |6|, |7|, 
see Kelly Q for an extensive survey. 

Dynamic routing can be viewed as a special case of the on-line load bal- 
ancing problem, where the load (incoming calls or requests) may be assigned 
to one or more servers (network links), and jobs (communication requests) 
can be scheduled only on specific subsets (paths) of the set of servers, as 
defined by the network topology. In this paper we study the impact of re- 
placing the sequential searches of the routing algorithm by a version of the 
balanced allocation principle. The basic idea is as follows: Instead of sequen- 
tially choosing alternative options (in our case, paths) until a desirable one 
is found, in the balanced-allocation regime the algorithm randomly chooses 
and examines a number of possible options, and assigns the job at hand to 
the option which appears to be the best at the time of the assignment. 

A number of papers have demonstrated the advantage of the application 
of the balanced allocation-principle § ||, ||, [n], [l8| for standard load bal- 
ancing problems, where jobs require only one server and can be executed by 
any server in the system. This research has shown that balanced allocations 
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usually produce a very substantial improvement in performance, at the cost 
of a small increase in overhead: Since several alternatives are examined even 
when the first alternative would have been satisfactory, the complexity of 
the routing algorithm is increased. But, as has been shown before and as 
we also demonstrate in the present context, examining even a very small 
number of alternative (thus increasing overhead by a very small amount) 
can offer great performance improvements. 

The idea of employing the balanced allocation principle to the problem 
of dynamic network routing as described in this paper was first explored 
in [12]. In this context the goal is to reduce system congestion and mini- 
mize the blocking probability, that is, the probability that a call request is 
rejected. The main difficulty in applying and analyzing the balanced alloca- 
tion principle in a network setting is in handling the dependencies imposed 
by the topology of the network. The preliminary results in [12| show that 
the advantage of balanced allocations is so significant that it holds even in 
the presence of a set of dependencies. 

The performance of a routing protocol can be analyzed in a static (finite, 
discrete time) or in a dynamic (infinite, continuous time) setting. The static 
case has been extensively studied in extending and strengthening the 
results in [O]. In this paper we consider the continuous-time case. The 
analysis of the continuous-time case suggested in |TJ] was based on apply- 
ing Kurtz's density-dependent jump Markov chain technique, following the 
supermarket model analysis in 17, 18]. However, since the argument there 
is incomplete [1C], we present here a different analysis. Our results con- 
cern the long-term behavior of large networks employing a routing protocol 
based on the balanced allocations principle. The main tools we employ are 
a Lyapunov drift criterion used to establish the existence of a stationary dis- 
tribution for the BDAR routing protocol, and a continuous-time extension 
of the technique in || , used to analyze the stationary behavior of a network. 

Balanced allocations have also been studied in the context of queueing 
networks, where analogous results (under different asymptotic regimes than 



the ones in this paper) are obtained in [O, E2L 13, 21], among others. 



1.1 Model Description and Main Results 

In the types of networks considered in this paper, a logical link or "band- 
width" is reserved between each pair of stations, and an alternative route is 
only used when this logical link has already been exhausted. We model such 
a network as the complete graph G = (V, E) with \V \ = n vertices (stations) 
and \E\ = N = (2) edges (links). 
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The input to the system is a sequence of call requests, which are assumed 
to arrive at Poisson times: New calls onto each link (i.e., between each pair of 
nodes) arrive according to a Poisson process with rate A, all arrival streams 
being independent. Similarly, the duration of a call is independent of all 
arrival times all other call durations, and it is exponentially distributed 
with mean 1/fJ,. 

The routing algorithm has to process the calls on-line, that is, the t-th 
request is either assigned a path or rejected before the algorithm receives 
the (t + l)-th request. Once a call is assigned to a path, that path cannot 
be changed throughout the duration of the call. We assume that each edge 
has a capacity of 2B calls, where half of this capacity is reserved for direct 
links (namely it will only be used for call requests between these two nodes) , 
and the other half is reserved for being used as part of an alternative route 
between two stations. 

As in most of our results we consider large networks with a number n of 
nodes growing to infinity, we will also assume that the capacity parameter 
B may vary with n. Specifically, we assume that B = B n is nondecreasing 
in n, and we also allow the possibility B = oo. 

The goal in designing an efficient routing protocol is to assign routes to 
the maximum possible number of call requests without violating the capacity 
constraints on the edges. We will compare the performance of the following 
two protocols: 

The d-Dynamic Alternative Routing (DAR) algorithm works as follows. 
When a new call request arrives, it tries to route the call through the direct 
(one-link) path. If there is no available bandwidth on the direct path, then 
the algorithm sequentially chooses alternative routes of length two and as- 
signs the call to the first available path. Up to d such choices are made, and 
they are made at random. If no possible path is found, then the request is 
rejected. 

The d-Balanced Dynamic Alternative Routing (BDAR) algorithm also 
assigns a new call request to the direct path if there is available bandwidth. 
If not, then the algorithm chooses d length-two alternative paths at random, 
and compares the maximum load among them (where the load of such a path 
is taken to be the maximum load of the two links on that path). Then the 
call is assigned to the path with the minimum load. As before, if there is no 
path with free bandwidth among these d choices, then the call is rejected. 

The model described so far, together with one of the two protocols above, 
induces a continuous-time stochastic process describing the behavior of the 
network. As we show below, this system (for fixed n) converges to a sta- 
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tionary regime exponentially fast. For our purposes, the main performance 
measure is the minimum required bandwidth that ensures that, under the 
stationary distribution of the network, the blocking probability (i.e., the 
probability that a new call is rejected) is appropriately small. 

In this paper our main goal is to compare the performance of the DAR 
algorithm with that of BDAR. It is clear that BDAR's performance is dom- 
inated by its performance on alternative (length-two) routes. Therefore, 
in order to simplify the analysis, we consider a variant of BDAR, called 
BDAR*, which ignores the direct links and services each call only via an 
alternative route, making use only of the B alternative connections of each 
edge. In other words, we assume that each edge has capacity B and all of it 
is dedicated to alternative routes. We show that even though the BDAR* 
policy ignores the direct links, it has superior performance compared to 
DAR. 

The following result illustrates this superiority by exhibiting explicit 
asymptotic bounds on their bandwidth requirements. It follows from the 
results in Theorems || and ||. 

Theorem 1. Assume that all the edges have a capacity of 2B links. 
Under the DAR policy, edge capacity 



B = Vl \ \ — — : ] , as n — > oo 

\ V a In Inn / 

is necessary to ensure that a new call is not lost with high probability. 

On the other hand if we perform the BDAR* policy (thus ignoring the B 
direct links), edge capacity 

In In n /In In n \ 
B = — — — + o — — — , as n ^ oo 

ma V. hid J 

suffices to ensure that a new call is not lost with high probability. 

In the above result and throughout the paper, we say that a limiting 
statement holds "with high probability" (abbreviated "whp.") if it holds 
with probability that is at least 1 — l/n c for some constant c > 0. For 
example, when we say that a random variable "X n = O(lnn) whp." we 
mean that there are positive constants C and c such that Vi(X n < Clnn) > 
1 — l/n c for all n large enough. Similarly, X n = o(lnn) whp. means that 
there is a c > such that, for all e > 0, Pr(X n < elnn) > 1 — l/n c for all n 
large enough. 

Note that the result of Theorem 1 is exactly analogous to that obtained 



in 1 11] in the discrete-time case. 
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2 Analysis of Balanced-Allocation Routing 

This section presents the main contribution of this paper, a steady state 
analysis of the performance of the BDAR* routing algorithm. The network 



arrive at Poisson times with rate A and their durations are exponentially dis- 
tributed with mean 1/fJ,, as described earlier. As it turns out, an important 
parameter in the analysis of the network load is the ratio p = X/fi. 

2.1 Unbounded capacities 

We first analyze the maximum load on edges when the algorithm is used on 
a network with unbounded edge capacity, corresponding to B = B n = oo. 
This model induces a continuous time Markov process <& = {<&(£) : t > 
0}, where = (h(t), hit), ■ ■ ■ , lN(t)), and each li(t) denotes the load, 
at time t, of the ith link in the network. As we show next, this Markov 
process has a stationary distribution 7r n to which it converges exponentially 
fast, regardless of the initial state of the network. We then prove a high 
probability bound on the maximum load on any edge in the system under 
this stationary distribution. 

Since we are only interested in the load of the alternative paths on 
the edges, each state of this Markov process corresponds to the load on 
edges from a collection of length- two paths. We say that a vector x = 
(h,fo, ■ ■ ■ ,In) is a legal state if it corresponds to the load on the N edges 
from a collection (possibly empty) of length- two paths. The natural state 
space £ for our process is then taken to be 



The process $ evolves on £ according to the model described above. This 
evolution is formalized by the transition semigroup {P* : t > 0} of where 
P t (x, y) is simply the probability that $ is in state y at time t given that it 
was in state x at time zero, P l (x,y) = Pr{<E>(t) = y \ $(0) = x}. 

Our first result shows that $ has a stationary (or invariant) distribution 
to which it converges exponentially fast. It is stated in terms of the "Lya- 
punov function" V{x) which is defined as l+(total number of active calls in 
state x): 




undirected edges. New calls 



S = {x 



(Zi, h, ■ ■ ■ , In) I € N, x is a legal state} . 




(1) 



i=i 
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Theorem 2. Assume that the BDAR* algorithm is used on a network with n 
nodes, each of which has infinite capacity. Then the induced Markov process 
has an invariant distribution 7r n , and, moreover, for any initial state 
x £ S, the distribution of &(t) converges to ir n exponentially fast, namely 
there is a constant 7 < 1, such that 

sup\P t (x,y) — TT n (y)\ < V(x)'y t , for all t > and all x 6 E. 
y 

Proof. Our proof uses the Lyapunov drift criterion for the exponential er- 
godicity of a continuous time Markov processes [14, ||, [l5|]. To state our 
main tool we recall a few definitions, adapted to our case of countable state 
space. 

The generator A of the process $ is a linear operator on functions F : 
S -> R defined by 

/i|0 /l 

whenever the above limit exists for all x € X. The explosion time of $ is 
defined as 

C = sup J n , 

n 

where 

J = 0, J n+ i = inf{t > J n : $ t ^ $j n } 

(Jo, Ji, . . . are the jump times of the Markov process). We say $ is nonex- 
plosive if Pr(C = 00 | $0 = x ) = 1 f° r an Y starting state x. 

The following theorem follows from the more general results in [jl5] , 
specialized to the case of a continuous-time Markov process with a countable 
state space. 

Theorem 3. ji~5| , ||] Suppose a Markov process evolving on a countable state 
space that is nonexplosive, irreducible (with respect to the counting measure 
on Y,) and aperiodic. If there exists a finite set C C S, constants b < 00, 
j3 > and a function V : £ — > [1, 00), such that, 

AV(x) < -pV(x) + bl c (x) i£E, (2) 

then the process is positive recurrent with some invariant probability measure 
ir, and there exist constants 7 < 1, D < 00 such that 

sup|P*(x,y) - 7r(y)| < DV(x)^ t , for all t > and all i£E. 

y 



G 



It is easy to verify that the process is ^-irreducible and aperiodic, with 
the maximal aperiodicity measure ip being the counting measure on £.[] Also 
the process is nonexplosive since the number of new calls in a given interval 
has a Poisson distribution with a finite mean, therefore the probability of 
infinite number of transition in a finite interval is 0. 

To show that the drift criterion @ can satisfied, we use the Lyapunov 
function V(a;)=l+(total number of active calls in state x) defined in (|l|) 
above. 

In order to compute AV we notice that when a new call enters the 
system, it increases the loads of two edges by 1, hence the value of V by 
1, and when a call terminates the value of V decreases by 1. Therefore, 
new calls are generated with rate XN and calls are terminated at a rate 
fi(V(x) — 1). The probability that in a time interval h there are 2 or more 
new calls or terminations of calls is o(/i).0 Using these observations we can 
compute AV: 

= lim YM ± XN ■ h - a ■ - 1) ■ ft + o(h) - v(x) 

HO h 
= XN-/j,V(x)+n 

To analyze the drift condition we distinguish between the following two 
cases: 



x € C: 



AV(x) = XN — fj,V(x) +n< + XN + n 



x E C c (x is in the complement of C): 

AV(x) = XN- fiV(x) +fi< - fiV(x) = -f^M 



Thus, the drift condition holds for j3 = fi/2 and b = XN + fi. □ 

Having shown the existence of an invariant limiting distribution n n , we 
now analyze the maximum load on the edges under this distribution. 



lr rhis follows along the lines of the arguments in Chapters 4 and 5 of |16|. In particular, 
note that all sets {y} £ E are ^i-small and P 1 (x, y) > for all x, y £ E so that in fact $ 
is irreducible and strongly aperiodic. 

2 Here and in the next expression with the notation o(h) we mean that / is o(h) if 
lim h ^ = 0. In the rest of the text o(n) has the usual meaning. 
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Theorem 4. Consider a network with n nodes, and let ix n be the invariant 
distribution of the induced Markov process under the BDAR* policy with 
unbounded edge capacity. Under ir n , the maximum number of calls in any 
edge is bounded whp. by 

In In n ( In In n 
Ind \ In d 

Proof. In order to compute the maximum edge load under the stationary 
distribution, we start observing the system at some time point and study its 
transient behavior; we then use the results to deduce the properties of the 
invariant distribution. In particular, we show that there exists a constant 
T = O On ll ] r 1 1 I ^ rt ) , such that for any state of the system at time r — T that 
has sufficiently large probability, whp. at time r the maximum number of 
calls on any edge is 

In In n ( In In n \ 
lnd + °V lnd J ' 

The high level idea is the following: We partition the time T into + 
o ( i^" - ) periods of length 0(n). Roughly, we argue that at the end of the 
i-th period, whp., for each node, the number of incident edges with load 
greater than i is at most a*. The on decrease doubly exponentially, so at the 
end of the last period we will be able to deduce that there are no edges with 
load more than ^^J 1 whp. The challenge is to handle the dependencies, as 
the number of calls during some period depends on the number of calls of 
the previous periods. We now proceed with the details. 

Suppose that a call routed at time t is assigned to edges e\ and e2- The 
height of that call at edge e\ is 1 plus l ei (t—). We define the following 
random variables: 

• L>j(i): Number of edges incident to node v with load at least i at 
time t. 

• My^t): Number of calls at edges incident to v with height greater or 
equal to i at time t. 

Trivially we have L^(t) < M^(t). 

We define the sequence of values {a^} which decreases doubly exponen- 




ts 



tially: 



a; 



(n-l)p 



2p-4 d -af_ 1 



where k = e ■ d y/2p ■ 4 d 



(n - l)^ 1 
on* = 25 Inn 
a>i* + i = 10 

Solving the recurrence we get for k < i < i* , 



for i > and > \l —n d 1 Inn, 

P 



i* is the smallest i for which aj_i < **/ — n d 1 lnn 

P 



a; 



(2p-A d )^ 
1 



(n-1) 



n — 1 



1 



V2p ' 4 d 



d ~y/2p ■ ¥ 



(n-1) 



(3) 



and for the i* 



which gives 



\f ^-n d ~ l Inn 



In Inn 
lnd 



+ o 



In lnn 
lnd 



Next we define T = n(i* + 2) = ( n ll \^J l ) and an increasing sequence 
of points in time: let t K = r — T and for i > k, t\ = + n, so that the 
end of the last period, £j*+2, is the current time r. 

Let E denote the event "at time t K there are at most (1 + e)N p calls in 
the system," and let 

d = {VveV,t£ [ti,T] : M£i(t) < 2a,}. 

We will show by induction that for i = «,...,£* + 1 



Pr(-.Ci | E) < 



2i 



ii- 



For the base case (i = k), conditioning on E, the expected number of 
calls for a particular node v is (1 + e)(n — l)p, since each existing call has 
probability 2/n to have v as an endpoint. Hence, by using the Chernoff 
bound 



Pr(node v has more than (1 + S)(n — l)p calls | E) = o(l/n c ) 
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where e < 5 < 1 and c can be any positive constant. Therefore 



Pr(-C K | E) < n Pr ( M£ K > 2( ™ 



For the induction step we assume that 

2(i - 1) 



Pr(-a_i I E) < 



n 2 



Let G denote the event "a new call is generated with v as an endpoint," 
and call u the other endpoint and w the intermediate node of the alternative 
path. We have 

Pr(a new call increases M>j | G, Ci-\,E) 

< Pr(height of new call is > i in either (v, w) or (w, u) \ G, d-i,E) 



< 



TV I T u \ d 



n — 1 



( My: i + Af" -, V 

^ ( 1_! j since L >^(*) ^ M >^(*) 

< I — I = qi from the induction hypothesis (4) 

V n — 1 / 



Notice that for i = «,... , i* we have 



We now define 



% < - 1V (5) 
2p(n — 1) 



Fi = {Vt> € V : MZifc) < on} 



2i 

and prove Lemmas [l] and ||, that allow us to conclude that Pr(-iCj | E) < —. 



Lemma 1. Under the inductive hypothesis 
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Proof. Consider the time interval [ij_i,tj] and recall that — t\-\ = n. 

First notice that since the duration of each call follows an exponential 
distribution with parameter fi, the probability that a call that is already in 
the system at time will remain until the end of the interval ti is e~ n ^. 
Hence all these calls will end before the end of the interval with exponentially 
high probability. To analyze the number of the remaining calls that were 
created during the period we make use of lemma || which completes the 
proof of the lemma. □ 

Lemma 2. Consider a period of length A and a given node v. Conditioning 
on Ci-i, the number of new calls that increased My i when they were gener- 
ated, and remained until the end of the period is less than oti, with probability 
at least 1 — K. 

TV 1 

Proof. Each node has n — 1 incident links in each of which new calls are 
generated with rate A. Conditioning on having a new request on v, 
is increased with probability at most gj. Therefore the number of calls 
at time ti is stochastically dominated by that formed by a process that 
generates new calls with rate A(n — l)qi which have a duration exponentially 
distributed with parameter ji. This process is the same as the infinite server 



Poisson queue [19, page 18] in which the number of calls at the end of the 



period is distributed according to a Poisson distribution with rate 

\(n - l)qiAp 

where 

p = / — dx = Ac 1 - e " M ) < Ac 
Jo A /"A ' ~ fiA 

So the rate is at most Xqi(n — 1)//U = pq%(ji — 1). 
We now distinguish the following two cases: 

Case 1: For i < i*, by using Equation |5| we get that the expected number of 
calls at the end of the period is at most on/2 and by applying a Chernoff 
bound0 for the Poisson distribution, we get that the probability that 
the number of calls is higher than aj is bounded by 



a- 



3 see for example po| , page 416] 
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For % < i* we have from the definition of on 

e -(ln2-I)a, = e -(ln2-|)^5i = ^(1*2-1) = q 

while for i = i* we get 



-(ln2-|)ai = e - (in 2- 1)25 Inn = Q ( 



Case 2: For z = i* + 1, using Equation || we get that the expected number of 
calls at the end of the period is at most 

4 d , 1N (4-251nn) d 

(^Tp P(n " 1} = (n - I)*-* P 

and we get the high probability result with the Chernoff bound. 

□ 

Lemma 3. Under the inductive hypothesis 
Proof. We have: 



Pv{Fi\Ci^E) 



n 



< -— — ^ =r Pr(3v€ V,t a ,t b € [t u T] : 



< 



< 



Pr(Fi 


Ci-i,E) 




MUta) = 




n 


Pr(F 


Ci-i,E) , 




MUh) = 




n 


Pr(Fi 


Ci-i,E) , 




MUtb) = 



—r f T f T Pr(M^(i a ) = at 

Jt a =U Jt b =t a 



■ I [ Pr(M^(i a ) = a, 

l-ll-C'J Jt a =ti Jt b =t a 



The probability inside the integrals is the probability that the new calls 
generated during the interval [£ a >4]> increased and remained until the 
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end of the interval, are at least ctj. By applying Lemma ||, we get that this 
probability is at most n -4 . Hence 



n f 1 f T 1 



Pr(-.Ci | F u Ci. u E) < - r / / — dt b dt a 

1 — ^7 Jta=U Jt b =t a n 

n _ 2 1 



< 



1- \ n 4 

1 

since T = 0(n 1 ^). □ 

Having proven the two lemmas we can now show that Pr(-iCj | E) < 
2i/n 2 : 

Pr(-C, | E) = Pr(-Ci | Ci-i,E) ■ Pr(Cj_i,i2) 

+ Pr(-Q | -.Ci_i, tf) • Pr(-Q_i, £7) 
2(i - 1) 



<Pr(-.Ci|C7i-i,^) + 



,2 



Pr(-.C f i |C i _i JJ F i ,£O.Pr(F i |C r i _i,£0 

2(i 



+ Px(-.Ci | Ci_i, -.Fi, £) • Pr(^ | d- X ,E) + 

n z 

1 1 2(* - 1) 
< 1 h — 

— 9 9 9 

2i_ 

n 2 

We have therefore shown that the event Cj*+i holds whp. until the end 
of T, which means that for every node v, after the (i* + l)-th period, there 
will be no more than a^*+i = 10 incident edges with load more than i* + 1. 
We will now bound the probability that in the next interval ([tj*+2, ii*+3], 
the last interval of T) there will be an incident edge of v with load more 
than i* + 3, conditioning on the event Cj*+i. For this to happen, we must 
have at least 2 new calls to be routed using one of the 10 high-loaded edges. 
The probability that two specific new calls use these edges is at most 

10 V ' 0(±), (6) 



n — 'l I \ n 4 



since d > 2. The expected number of calls with v as an endpoint is A(n — l)n, 
since (n — 1) links are connected to v in each of which new calls are generated 
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with rate A, while the total length of the interval is n. This implies that 
whp. there will be 0{n 2 ) new calls in the whole period. Combining this fact 
with Equation ^ and summing for all the nodes we conclude that at the end 
of period T there will be no edges with load more than i* + 3 whp. 

We now consider the stationary distribution 7r n , and show that under it 



Pr 



In In n ( In In n 

^max — i ; h O 



Ind " \ lnci 



l-o(l) 



where / max denotes the maximum number of calls on any edge. Let s(t) be 
the state of the system at time r, and consider the following partitioning of 
the state space of the underlying Markov process: 

• Sf. States in which the total number of calls in the system is at most 
(1 + e)Np, and the maximum load is at most + o (tjj 1 ). 

• S 2 : States in which the total number of calls in the system is at most 
(1 + e)Np, and the maximum load is at least + ft (^^). 

• S3: States in which the total number of calls in the system is more 
than (1 + e)Np. 

We have shown that 



Pr(s(r) G S 2 I s(t - T) € S x U S 2 ) = o(l) 

and we can easily show that 

Pr(s(r) G S 3 I s(t - T) £ Si U S 2 ) = o(l) 

Moreover in the stationary distribution the number of calls in the system 
has a Poisson distribution with parameter N. Hence by using the Chernoff 
bound 

= 0(1) 

Then we have 

ie5 2 U5 3 teSa iGS 3 
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The second term is o(l), while for the first one 

E ^ = E Pr ( S ( T ) G 5 2 I S(T -T)= j) ■ TTj 

ieS 2 j 

= E P*(s(T)eS 2 \S(r-T)=j)-n j 

jeSius 2 

+ ^Pr( S (T)e5 2 |5(r-T)=j).7r J 

= E ^■o(l) + o(l)=o(l) 

jeSiUS 2 

Therefore 

E 7r * = °( 1 ) 

ies 2 us 3 

which implies that 

E = 1 - o W 

ieSi 

and completes the proof of the theorem. □ 
2.2 Bounded Capacities 

In this section we use the analysis of the BDAR* algorithm for unbounded 
capacities to compute the bandwidth requirement B (< oo) that ensures 
that a new call is not lost whp. 

Theorem 5. Assume that all the edges have capacity B circuits which can 
be a function of n. Then if we perform the BDAR* policy, edge capacity 

„ In In n ( In In n 

B = — — — + o — 

In a \ In a 

ensures that a new call is not lost whp. 

Proof. The result for finite B follows from the proof of Theorem ^ which 
concerns unbounded capacity. Since the Markov process is finite and ape- 
riodic there exists a stationary distribution. Moreover, the analysis for the 
unbounded case still holds for finite B as long as B < i* + 1. 

A new call will be rejected if all the d choices select one of the edges with 
load i* + 1 = In Inn/ In d + o(lnlnn/ hid). With probability at least 1 — n 2 , 
for each node, the number of incident edges with load at least i* + 1 is at 



as n 



oo 
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most 2(Xi*-\.\. Therefore the probability for a call to be rejected is no more 
than 



3 Lower Bound on the Performance of the DAR 
Algorithm 

To demonstrate the advantage of the balanced-allocation method we prove 
here a lower bound on the maximum channel load when requests are routed 
using the DAR algorithm. This bound shows an exponential gap between 
the capacity required by the balanced-allocation algorithm and the capac- 
ity required by the standard DAR algorithm for the same stream of inputs. 
Again we consider a complete network on n nodes and N = edges. 
Requests for connections between a given pair arrive according to a Pois- 
son process with rate A, the duration of a connection has an exponential 
distribution with expectation 

Theorem 6. Assume that all the edges have capacity 2B circuits which can 
be a function of n. Then if we perform the DAR policy, edge capacity 



is necessary to ensure that a new call is not lost whp. 

Proof. Recall that the edges have capacities 2B, capacity B is used for direct 
connections, and the remaining capacity B is used for alternative routes. We 
will compute a lower bound on the probability P = P(B), that a request 
arriving at an arbitrary time t is rejected. 

We consider first the probability P\ that the new call is not routed 
through the direct link. The process of routing calls through the direct link 
is similar to serving customers in an M/M/B/B loss system (Poisson ar- 
rival, exponential service time, B servers, up to B customers in the system). 
Applying Erlang's loss formula (e.g., 




since = 10. 



□ 
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We will now estimate the probability P2 that a request which was gener- 
ated at time t on edge e and failed to use the direct link e, fails to be routed 
by an alternative path (i.e., all the d attempts to find a non-saturated alter- 
native path do not succeed) . To give a lower bound to the failure probability, 
we consider a modified system that up to time t behaves differently from 
the real one by rejecting more calls than the real one. Specifically, whenever 
the direct link is saturated it tries only one alternative path and if any of 
the edges of the path are saturated the call is rejected. Thus, clearly more 
calls are lost in the modified system and therefore fewer calls will exist at 
time t. Notice though that from time t the system behaves in the regular 
way according to the DAR algorithm. 

In order to estimate the probability P2, we will try to lower bound the 
probability that at time t all the d alternative paths selected as candidates 
to serve the request have a saturated (as far as the bandwidth for alternative 
routes is concerned) edge, which is lower bounded by the probability that all 
the d edges selected have the corresponding edge ej saturated (see Figure [j]). 
For this we consider the system at some prior time t — r, for some r that we 
will fix later. If some edge ej at that time point is saturated with B calls, 
then the probability that all these calls remain in e, until time t equals 

p . — p-^Br 
1 remain — c 

Assume now that edges ej and ej are not saturated at time t — r, and let 
eij be the edge that joins them. We will try to compute the probability P ne w 
that a request is generated during r by e^-, routed through the alternative 
path ej— ej, and remained until time t. To simplify the argument, we ignore 
any already existing calls from routed through that path — we are allowed 
to do that as these calls only increase the usage of ej and ej . 

We notice the following facts: 

1 . All the direct connections of e^ are occupied at time t — r with prob- 
ability Pi. 

2. The time of a new event from edge e^ (either a new call or a termi- 
nation of an existing call) is exponentially distributed with parameter 
A + Bfj,. Hence the probability of the first new event taking place in 
the period r equals 

1 _ e -(A+^)r_ 

3. Conditioning that there is such a new event, the probability that it is 
a new call (which will have to use an alternative path if all the direct 
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Figure 1: Edge e with the new request and edges of the alternative paths. 



links are occupied) is 

A 



X + Bfj, 

4. The probability that this call is served by the path e^— ej is l/(n — 2). 

5. The probability that the call remains in the system until time t is at 
least e~^ T . 

Taking all these facts into account, we deduce that the probability that at 
time t we have a call from the edge eij in a and &j is at least 

Pl .(l- e -(A+BM)^ 



A + B\s, n 
^ C -A/,(A/M) B L 1\ A 



B\ \~ ej X + Bfi n' C ' 

where we have selected r = 1 and hence e~( x+B ^ T < 1/e for large enough n. 

For each edge ej there are n — 3 potential sources that are mutually 
independent. Notice however that if an edge e& is saturated then a diverted 
call from edge that selected the path e, — efc will be rejected and not 
contribute to the increase of the load of e^. We perform the above counting 
as long as there are at least n/2 — 1 non-saturated edges e,-. Then the 
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probability that an edge ej is saturated at time t is the minimum of P I( 
and 



-Pfull — ^ g J-fncw(^ -'new; 

and that minimum is always equal to Pf u u. Notice that a trivial upper bound 
for -P ncw is 1/n so 

(n/2-l\ o / iyn-B 

Let us now compute the probability P<i- There are at least n/2—1 edges ej 
whose probability of being saturated is at least Pf u n, hence the probability 
that all the d alternative paths selected contain one of the saturated edges ej 
is lower bounded by 

where the extra term —dB is needed to avoid dependences between the 
different edges ej. Substituting the value for P n ew we get 



2 d V P / (B\) dB \ ej 

\ \ dB i / -1 \ dn 

A \ -dB A t ' - 1 



A + Bfi J n dB " \^ n 

Therefore the probability that the call generated at time t is rejected is 
at least 



1 2 ~ 5! 2 rf V P / 



e -dBX/fj, . 



!WB 



B i /„/o_ JR_l\ ffi (\/fJ,) dB2 



i \ dB / , s dB 1 / 1 



e 



X + BnJ n dB \3 d 

-0(dB 2 In B—dB 2 In(A//i)) 



Therefore, in order to guarantee that a new call is not lost whp., the 
bandwidth must be at least 



B = tt 



Inn 
<i In In n 



□ 



19 



References 



G. R. Ash, R. H. Cardwell, and R. P. Murray. Design and optimization 
of networks with dynamic routing. BSTJ, 60, 8(8):1787-1820, 1981. 

Y. Azar, A. Broder, A. Karlin, and E. Upfal. Balanced allocations. In 
Proceedings of the 26th A CM Symposium on the Theory of Computing, 
pages 593-602, 1994. 

Y. Azar, A. Z. Broder, A. R. Karlin, and E. Upfal. Balanced allocations. 
SIAM Journal on Computing, 29(l):180-200, Feb. 2000. 

A. Z. Broder, A. Frieze, C. Lund, S. Phillips, and N. Reingold. Bal- 
anced allocations for tree-like inputs. Information Processing Letters, 
55(6):329-332, Sept. 1995. 

D. Down, S. P. Meyn, and R. Tweedie. Exponential and uniform er- 
godicity of Markov processes. Ann. Probab., 23(4):1671-1691, 1996. 

R. J. Gibbens, P. J. Hunt, and F. P. Kelly. Bistability in communication 
networks. In G. R. Grimmet and D. J. A. Welsh, editors, Disorder in 
Physical Systems, pages 113-128. Oxford Univ. Press, New York, 1990. 

R. J. Gibbens, F. P. Kelly, and P. B. Key. Dynamic alternative routing. 
In M. E. Steenstrup, editor, Routing in Communications Networks, 
pages 13-47. Prentice Hall, 1995. 

P. J. Hunt and C. N. Laws. Asymptotically optimal loss network con- 
trol. Mathematics of Operations Research, 18(4):880-900, 1993. 

F. P. Kelly. Loss networks. Annals of Applied Probability, l(3):319-378, 
1991. 

M. J. Luczak. Probability, Algorithms and Telecommunication Systems. 
DPhil thesis, Oxford University, 2000. 

M. J. Luczak, C. McDiarmid, and E. Upfal. On-line routing of random 
calls in networks. Probability Theory and Related Fields, 2002. To 
appear. 

M. J. Luczak and E. Upfal. Reducing network congestion and block- 
ing probability through balanced allocation. In IEEE Symposium on 
Foundations of Computer Science, pages 587-595, 1999. 



20 



[13] J. Martin and Y. Suhov. Fast Jackson networks. Ann. Appl. Probab., 
9(3):854-870, 1999. 

[14] S. P. Meyn and R. Tweedie. Stability of Markovian processes III: Foster- 
Lyapunov criteria for continuous-time processes. Adv. Appl. Probab., 
25:518-548, 1993. 

[15] S. P. Meyn and R. Tweedie. A survey of Foster-Lyapunov techniques for 
general state space Markov processes. In Proceedings of the Workshop 
on Stochastic Stability and Stochastic Stabilization, Metz, France, June 
1993. Springer- Ver lag, 1994. 

[16] S. P. Meyn and R. L. Tweedie. Markov Chains and Stochastic Stabil- 
ity. Communications and Control Engineering Series. Springer- Verlag, 
London, New York, 1993. 

[17] M. Mitzenmacher. The Power of Two Choices in Randomized Load 
Balancing. PhD thesis, University of California, Berkeley, August 1996. 

[18] M. Mitzenmacher. On the analysis of randomized load balancing 
schemes. In Proceedings of the 9th Annual ACM Symposium on Par- 
allel Algorithms and Architectures, pages 292-301, Newport, Rhode Is- 
land, June 22-25, 1997. SIGACT/SIGARCH and EATCS. Extended 
abstract. 

[19] S. M. Ross. Applied Probability Models with Optimization Applications. 
Dover Publications, Reprint, 1970. 

[20] S. M. Ross. A First Course in Probability. Macmillan, London, 5th 
edition, 1998. 

[21] Y. Suhov and N. Vvedenskaya. Fast Jackson networks with dynamic 
routing. Problems of Information Transmission, 38(2):136-153, 2002. 

[22] N. Vvedenskaya, R. Dobrushin, and F. Karpelevich. A queueing system 
with a choice of the shorter of two queues - an asymptotic approach. 
Problemy Peredachi Informatsii, 32(l):20-34, 1996. 



21 



